{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **演示0103：数组元素筛选**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **案例1：直接给定下标索引**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **通过下标索引数组返回多个元素** \n",
    ">* 不仅可以使用 *[start:end:step]* 形式来返回数组中的某个切片，也可以直接指定一个一维索引数组，返回每个索引对应的元素\n",
    ">* *take*方法可以起到类似的作用\n",
    ">* 对于二维数组，仅指定行索引数组，或仅指定列索引数组，可以返回对应的行或列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[4 3 6]\n",
      "[4 3 6]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "a = np.array([4, 3, 5, 7, 6, 8])\n",
    "indices = [0, 1, 4]    # 指定返回索引为0，1，4的三个元素\n",
    "b = np.take(a, indices)\n",
    "c = a[indices]\n",
    "print(b)\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3 4 5]\n",
      " [4 5 6 7 8]]\n",
      "[[1 4]\n",
      " [2 5]\n",
      " [3 6]\n",
      " [4 7]]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7],[4,5,6,7,8]])\n",
    "indices = [0, 3]    # 索引为0和3\n",
    "print(a[indices])    # 索引为0和3的两行\n",
    "print(a[:, indices])    #索引为0和3的两列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **从二维数组中返回指定行和列索引的数据**  \n",
    ">* 同时给定行索引数组$rows$和列索引数组$cols$：$a[rows, cols]$\n",
    ">  * 此时rows和cols的对应位置元素将组合成若干个**维度索引对**，每个维度索引对分别给定一个行索引和一个列索引\n",
    ">  * 然后从数组中返回每个维度索引对对应的元素，形成一个一维数组\n",
    ">* 这种情况下，要求rows和cols必须能够组合。例如：\n",
    ">  * $rows=[1,3], cols=[0,4]$，此时可以组合成2个维度索引对：$[[1,0],[3,4]]$\n",
    ">  * $rows=[1,2,3], cols=[0,1,1]$，此时可以组合成3个维度索引对：$[[1,0],[2,1],[4,1]]$\n",
    ">  * $rows=[1,2,3], cols=[4]$, 此时可以组合成3个维度索引对：$[[1,4],[2,4],[3,4]]$。可以看到$cols$虽然只有1个元素，但是可以与$rows$中每个元素分别组合\n",
    ">  * $rows=[1,2,3], cols[0,4]$，此时rows与cols无法进行合理的组合，因此将出现错误\n",
    ">* 先给定行索引数组，再从返回的行中给定列索引数组：$a[rows][:, cols]$\n",
    ">  * $a[rows]$返回指定行索引的所有行集合b，然后$b[:,cols]$用于返回行集合$b$中$cols$指定的列集合。请注意不要漏写$:$\n",
    ">  * 特别要注意：$rows,cols$必须是数组形式(即：使用[]将各个索引包起来)\n",
    ">  * 如果写成$a[row][cols]$形式，则被解释为：先返回指定行索引的所有行集合$b$，然后从$b$中返回由$cols$指定索引的所有行集合$c$。两次返回的都是行的子集和，并没有对列进行选择。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2 8]\n",
      "[2 4 5]\n",
      "[6 7 8]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7],[4,5,6,7,8]])\n",
    "print(a[[1,3],[0,4]])    # 返回维度索引对[1,0]和[3,4]对应的元素\n",
    "print(a[[1,2,3],[0,1,1]])\n",
    "print(a[[1,2,3],[4]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 5]\n",
      " [3 7]\n",
      " [4 8]]\n",
      "5\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,2,3,4,5],[2,3,4,5,6],[3,4,5,6,7],[4,5,6,7,8]])\n",
    "print(a[[0,2,3]][:, [0,4]])    # 返回行索引为0,2,3，列索引为0,4的子矩阵\n",
    "#print(a[[0,2,3]][[0,4]])    # 错误，a[[0,2,3]]仅返回了3行数据的子集合，但[0,4]要返回子集合中索引为0和4的行\n",
    "print(a[[0,2,3]][0,4])    # 忘记将0,4写成数组形式，因此0相当于子集中的行索引，4相当于自己中的列索引"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **通过下标索引数组返回的子集合，是对原有数组的拷贝，并不共享数据内存**  \n",
    ">* 与 *[start:end:step]* 形式的数组切片不同，数组切片返回是与原始数组共享内存的数据；而通过索引数组返回的子集合，则是对原始数组中数据的拷贝"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 3 4 5]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1,2,3,4,5])\n",
    "a[[0,1]][0] = 100\n",
    "print(a)    # a不会发生变化 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **案例2：基于元素值间接给定下标索引**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **返回非零元素**  \n",
    ">* *nonzero*函数返回数组中所有非零元素的下标索引\n",
    ">* 对于二维数组，该函数返回两个下标索引数组。其中，第一个数组存放所有非零元素的行索引，第二个数组存放所有非零元素的列索引。将两个数组中对应元素组合，就得到所有非零元素的维度索引对"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(array([0, 3, 4, 6], dtype=int64),)\n",
      "[1 3 5 8]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1,0,0,3,5,0,8])\n",
    "b = np.nonzero(a)    # 非零元素的索引组成的一维数组\n",
    "print(b)\n",
    "c = a[b]    # 返回对应的非零元素值\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(array([0, 1, 2, 2], dtype=int64), array([0, 1, 0, 1], dtype=int64))\n",
      "[1 2 1 1]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,0,0], [0,2,0], [1,1,0]])\n",
    "b = np.nonzero(a)   # 返回两个数组，第一个数组存放行索引，第二个数组存放列索引\n",
    "print(b)\n",
    "c = a[b]    # 行、列索引对应位置组合，形成维度索引对[0,0],[1,1],[2,0],[2,1]，共4个\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **根据元素值大小进行过滤**  \n",
    ">* *np.where*函数可返回元素值大小满足某种条件的所有元素下标\n",
    ">* 对于二维数组，其返回值与*np.nonzero*的类似"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(array([7, 8, 9], dtype=int64),)\n",
      "[21 24 27]\n"
     ]
    }
   ],
   "source": [
    "a = np.arange(10)*3\n",
    "b = np.where(a>20)   # 返回所有元素值大于20的元素下标索引\n",
    "print(b)\n",
    "print(a[b])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **检查数组及其子集合的数据内存是否共享**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 3 4 5]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1,2,3,4,5])\n",
    "b = np.where(a>0)\n",
    "c = a[b]    # 此处相当于通过索引数组获得a的子集，这个过程将执行数据拷贝\n",
    "c[0] = 100\n",
    "print(a)    # a不会发生变化"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **案例3：基于True/False条件返回对应的元素**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **通过指定一个*True/False*数组，返回所有*True*位置的元素**  \n",
    ">* 对数组执行逻辑判断，将返回一个*True/False*数组，然后可以根据该数组返回*True*对应位置的元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0 3 4]\n"
     ]
    }
   ],
   "source": [
    "a = np.arange(5)\n",
    "b = np.array([True, False, False, True, True])\n",
    "print(a[b])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[False False False False False False  True  True  True  True]\n",
      "[6 7 8 9]\n"
     ]
    }
   ],
   "source": [
    "a = np.arange(10)\n",
    "b = a > 5    # 根据判断，获得True/False数组\n",
    "print(b)\n",
    "print(a[b])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
